A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks
نویسندگان
چکیده
Reward engineering is an important aspect of reinforcement learning. Whether or not the users’ intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often requires parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a real-valued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a model-free learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach.
منابع مشابه
A Hierarchical Reinforcement Learning Method for Persistent Time-Sensitive Tasks
Reinforcement learning has been applied to many interesting problems such as the famous TD-gammon [1] and the inverted helicopter flight [2]. However little effort has been put into developing methods to learn policies for complex persistent tasks and tasks that are time-sensitive. In this paper we take a step towards solving this problem by using signal temporal logic (STL) as task specificati...
متن کاملTemporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison
Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving difficult RL problems, but few rigorous comparisons have been conducted. Thus, no general guidelines describing the methods’ relative strengths and weakne...
متن کاملContinuous-action reinforcement learning with fast policy search and adaptive basis function selection
As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In t...
متن کاملReinforcement Learning of Fuzzy Logic Controllers for Quadruped Walking Robots
This paper presents a fuzzy logic controller (FLC) for the implementation of some behaviour of Sony legged robots. The adaptive heuristic Critic (AHC) reinforcement learning is employed to refine the FLC. The actor part of AHC is a conventional FLC in which the parameters of input membership functions are learned by an immediate internal reinforcement signal. This internal reinforcement signal ...
متن کاملTransfer Learning for Policy Search Methods
An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference (Sutton & Barto, 1998) approach to transfer in reinforcement learning (Sutton & Barto, 1998) tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1709.09611 شماره
صفحات -
تاریخ انتشار 2017